Analyzing Neurotransmitter Receptors & Protein Sequences

Mikel Garcia Amez, Elvin Kalinowski, Ákos Kimpián, Joshua Lembeck, Marcel Skumantz

2025-02-12

Introduction: Context

- Neurotransmitter receptors are key proteins in neuronal communication that detect chemical signals and translate those into cellular responses to regulate a variety of biological processes.

- Many of these work as ionic channels activated by ligands and their aminoacid (Aa) sequence determines properties as ion selectivity, channel gating dynamics or subunit organization.

- Some Aa have strong influence on the structure and function of the receptors, favoring the recognition of the ligand and channel activation.

- Understanding these principles, a relation between Aa sequences and type of receptor can be found.

Introduction: Data Set info

The dataset consist on a curated collection of neurotransmitter receptors from different organisms, as well as data regarding their composition, chemical properties and structural and functional characterization.

Materials and Methods: Dirtying – Cleaning

  1. Split data

Combine via .row_id

  1. Insert missing values

  • Random missingness
  • Imputation sensible
  1. Insert outliers

Materials & Methods: Prediction preprocessing

  • AA composition variables (aa_*) as features
  • Receptor classes using pattern-based annotation of Protein_Name as target
    • Cys-loop receptors
    • Ionotropic glutamate receptors
    • Other ionotropic receptors

Materials & Methods: PCA

  • Examine structure before modelling
  • using tidymodels
  • PC1 <-> PC2 scatter
#...
pca_rec <- recipe(~., data = prediction_df) |>
  update_role(Protein_ID, Receptor_class, 
              new_role = "id") |>
  step_normalize(all_predictors()) |>
  step_pca(all_predictors())
#...

Materials & Methods: Predictive Modeling

  • Stratified \(80\)/\(20\) train–test split to maintain class balance
  • Random Forest classifier with \(1000\) trees
  • Basic Metrics and Mean Decrease Gini (MDG) as feature importance
#...
pca_rec <- recipe(~., data = prediction_df) |>
  update_role(Protein_ID, Receptor_class, 
              new_role = "id") |>
  step_normalize(all_predictors()) |>

rf_spec <- rand_forest(trees = 1000) |>
  set_engine("randomForest") |>
  set_mode("classification")
#...

Results: PCA and classification results

Results: Length and weight correlation analysis

Discussion: Biological Interpretation

  • Valine, Glycine, Tryptophan, Serine, and Proline are most discriminative
    • Valine is common in sheets
    • Glycine, Serine and Proline common in turns and coils
    • Cys-loop receptors contain aromatic cage in ligand binding domain (Tryptophan)
  • However, feature importance should be interpreted cautiously

Sys-loop Receptor

Ionotropic Glutamate Receptor

Discussion: Limitations and Future Directions

  • It is possible to distinguish receptor types just from the amino acid composition
    • Prediction of Cys-loop and Glutamate ion receptors very accurate
    • Other receptors are not well predicted due to lack of representation in the data set
  • Things that could be pursued in more detail:
    • Train a model against larger variety of background proteins
    • Add AA sequence data to gain insight into structural features for more precise prediction